Goto

Collaborating Authors

 data validation


Scenarios Engineering driven Autonomous Transportation in Open-Pit Mines

Teng, Siyu, Li, Xuan, Li, Yuchen, Li, Lingxi, Ai, Yunfeng, Chen, Long

arXiv.org Artificial Intelligence

One critical bottleneck that impedes the development and deployment of autonomous transportation in open-pit mines is guaranteed robustness and trustworthiness in prohibitively extreme scenarios. In this research, a novel scenarios engineering (SE) methodology for the autonomous mining truck is proposed for open-pit mines. SE increases the trustworthiness and robustness of autonomous trucks from four key components: Scenario Feature Extractor, Intelligence & Index (I&I), Calibration & Certification (C&C), and Verification & Validation (V&V). Scenario feature extractor is a comprehensive pipeline approach that captures complex interactions and latent dependencies in complex mining scenarios. I&I effectively enhances the quality of the training dataset, thereby establishing a solid foundation for autonomous transportation in mining areas. C&C is grounded in the intrinsic regulation, capabilities, and contributions of the intelligent systems employed in autonomous transportation to align with traffic participants in the real world and ensure their performance through certification. V&V process ensures that the autonomous transportation system can be correctly implemented, while validation focuses on evaluating the ability of the well-trained model to operate efficiently in the complex and dynamic conditions of the open-pit mines. This methodology addresses the unique challenges of autonomous transportation in open-pit mining, promoting productivity, safety, and performance in mining operations.


Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

Koh, Woosung, Choi, Insu, Jang, Yuntae, Kang, Gimin, Kim, Woo Chang

arXiv.org Artificial Intelligence

Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.


YOLOv5 : Violation Detection on the Roadside of the Toll Roads

#artificialintelligence

Based on Indonesia's Government Regulation no. 15 of 2005 Article 1 paragraph (2), toll roads are public roads that are part of the road network system and as national roads whose users are required to pay tolls. Toll roads can also be referred to as expressways. However, from the many advantages of using toll roads as transportation routes, there are still many accidents that occur on toll roads. The high number of accidents on the toll roads is mostly caused by human negligence. An expert researcher at the Center for Transportation and Logistics Studies (PUSTRAL) UGM said several factors causing accidents on toll roads include driver negligence, vehicles, environment and roads, and weather.


Data validation in Python: a look into Pandera and Great Expectations

#artificialintelligence

Liam studied an MSci in Physics at University College London, which included modules on Statistical Data Analysis, High Performance Computing, Practical Physics and Computing. This led to his dissertation exploring the use of machine learning techniques for analysing LHC particle collision data. Before joining endjin, Liam had a keen interest in data science and engineering, and did a number of related internships. However, since joining endjin he has developed a much broader set of interest, including DevOps and more general software engineering. He is currently exploring those interests and finding his feet in the tech space.


Data Validation and Data Verification – From Dictionary to Machine Learning - KDnuggets

#artificialintelligence

Quite often, we use data verification and data validation interchangeably when we talk about data quality. However, these two terms are distinct. Table 1 explains dictionary meaning of the words verification and validation with a few examples. To summarize, verification is about truth and accuracy, while validation is about supporting the strength of a point of view or the correctness of a claim. Validation checks the correctness of a methodology while verification checks the accuracy of the results. Now that we understand the literal meaning of the two words, let's explore the difference between "data verification" and "data validation".


Comparing Shape-Constrained Regression Algorithms for Data Validation

Bachinger, Florian, Kronberger, Gabriel

arXiv.org Artificial Intelligence

Industrial and scientific applications handle large volumes of data that render manual validation by humans infeasible. Therefore, we require automated data validation approaches that are able to consider the prior knowledge of domain experts to produce dependable, trustworthy assessments of data quality. Prior knowledge is often available as rules that describe interactions of inputs with regard to the target e.g. the target must be monotonically decreasing and convex over increasing input values. Domain experts are able to validate multiple such interactions at a glance. However, existing rule-based data validation approaches are unable to consider these constraints. In this work, we compare different shape-constrained regression algorithms for the purpose of data validation based on their classification accuracy and runtime performance.


7 Considerations Before Pushing Machine Learning Models to Production

#artificialintelligence

Being part of a company that values scalability, I daily see, as a data scientist, the challenges that come with putting AI-based solutions in production. These challenges are numerous and cover a variety of aspects: modeling and system design, data engineering, resource management, SLA, etc. I don't pretend mastery in any of those fields. I do however know that implementing some software engineering principles and using the right tools helped me a lot in making my work reproducible and ready for production. In this article, I'll share with you 7 of the considerations I have in mind before productionizing my models.


Serving a Machine Learning Model with FastAPI and Streamlit

#artificialintelligence

Machine learning is a hot topic at present. With technology companies moving in the direction of artificial intelligence and machine learning to cash in early, the field has grown tremendously large. Many of these companies create their own machine learning solutions and sell them to others using a subscription-based model. Since the majority of machine learning models are developed in Python, the web frameworks that serve them up are usually Python-based as well. For a long time, Flask, a micro-framework, was the goto framework.


Data Validation in Machine Learning is Imperative, Not Optional - KDnuggets

#artificialintelligence

Operationalizing a Machine Learning (ML) model in production needs a lot more than just creating and validating models like in academia or research. The ML application in production can be a pipeline with multiple components running consecutively as shown in Fig 1. Before we reach model training in the pipeline, there are various components like Data Ingestion, Data versioning, Data validation, and Data pre-processing that need to be executed. Data validation means checking the accuracy and quality of source data before training a new model version. It ensures that anomalies that are infrequent or manifested in incremental data are not silently ignored.


Marketing Data Scientist (copy)

#artificialintelligence

As an industry leader and Software-as-a-Service provider our mission at 8x8, Inc. [NYSE: EGHT] is to transform the future of business communications. The 8x8 Open Communications Platform (TM) uniquely brings …